Goto

Collaborating Authors

 discrepancy principle


EarlyStopping: Implicit Regularization for Iterative Learning Procedures in Python

arXiv.org Machine Learning

Iterative learning procedures are ubiquitous in machine learning and modern statistics. Regularision is typically required to prevent inflating the expected loss of a procedure in later iterations via the propagation of noise inherent in the data. Significant emphasis has been placed on achieving this regularisation implicitly by stopping procedures early. The EarlyStopping-package provides a toolbox of (in-sample) sequential early stopping rules for several well-known iterative estimation procedures, such as truncated SVD, Landweber (gradient descent), conjugate gradient descent, L2-boosting and regression trees. One of the central features of the package is that the algorithms allow the specification of the true data-generating process and keep track of relevant theoretical quantities. In this paper, we detail the principles governing the implementation of the EarlyStopping-package and provide a survey of recent foundational advances in the theoretical literature. We demonstrate how to use the EarlyStopping-package to explore core features of implicit regularisation and replicate results from the literature.


Continuous Optimization for Offline Change Point Detection and Estimation

arXiv.org Machine Learning

Change point detection and estimation are an incredibly diverse and widely scattered field in applied and mathematical statistics, with a large variety of applications. To provide a high-level intuition, change point detection may be understood as a signal processing tool for identifying abrupt changes in the generative parameters of a data sequence. While a strong line of work in change point detection is well established with Page's pioneering work (see Page [1954]) and rigorous results by Chernoff and Zacks [1964], Lorden [1971] and Sen and Srivastava [1975], many aspects of this problem are open and the general understanding of good solutions depends strongly on the problem at hand Niu et al. [2016], Truong et al. [2020], and Ma et al. [2020]. Among the open research questions, the simultaneous detection of multiple change points in large data sets is of major interest. Taking a machine learning and data scientific motivated approach, in this paper, we explore the applicability of recent advances in best subset selection of covariates in linear regression proposed by Moka et al. [2024]. This method, a continuous optimization approach for best subset selection, claims to offer faster performance compared to existing exhaustive search methods, while maintaining comparable accuracy.


ODE-DPS: ODE-based Diffusion Posterior Sampling for Inverse Problems in Partial Differential Equation

arXiv.org Artificial Intelligence

In recent years we have witnessed a growth in mathematics for deep learning, which has been used to solve inverse problems of partial differential equations (PDEs). However, most deep learning-based inversion methods either require paired data or necessitate retraining neural networks for modifications in the conditions of the inverse problem, significantly reducing the efficiency of inversion and limiting its applicability. To overcome this challenge, in this paper, leveraging the score-based generative diffusion model, we introduce a novel unsupervised inversion methodology tailored for solving inverse problems arising from PDEs. Our approach operates within the Bayesian inversion framework, treating the task of solving the posterior distribution as a conditional generation process achieved through solving a reverse-time stochastic differential equation. Furthermore, to enhance the accuracy of inversion results, we propose an ODE-based Diffusion Posterior Sampling inversion algorithm. The algorithm stems from the marginal probability density functions of two distinct forward generation processes that satisfy the same Fokker-Planck equation. Through a series of experiments involving various PDEs, we showcase the efficiency and robustness of our proposed method.


Analyzing the discrepancy principle for kernelized spectral filter learning algorithms

arXiv.org Machine Learning

We investigate the construction of early stopping rules in the nonparametric regression problem where iterative learning algorithms are used and the optimal iteration number is unknown. More precisely, we study the discrepancy principle, as well as modifications based on smoothed residuals, for kernelized spectral filter learning algorithms including gradient descent. Our main theoretical bounds are oracle inequalities established for the empirical estimation error (fixed design), and for the prediction error (random design). From these finite-sample bounds it follows that the classical discrepancy principle is statistically adaptive for slow rates occurring in the hard learning scenario, while the smoothed discrepancy principles are adaptive over ranges of faster rates (resp. higher smoothness parameters). Our approach relies on deviation inequalities for the stopping rules in the fixed design setting, combined with change-of-norm arguments to deal with the random design setting.


A Learning Theory Approach to a Computationally Efficient Parameter Selection for the Elastic Net

arXiv.org Machine Learning

Despite recent advances in regularisation theory, the issue of parameter selection still remains a challenge for most applications. In a recent work the framework of statistical learning was used to approximate the optimal Tikhonov regularisation parameter from noisy data. In this work, we improve their results and extend the analysis to the elastic net regularisation, providing explicit error bounds on the accuracy of the approximated parameter and the corresponding regularisation solution in a simplified case. Furthermore, in the general case we design a data-driven, automated algorithm for the computation of an approximate regularisation parameter. Our analysis combines statistical learning theory with insights from regularisation theory. We compare our approach with state-of-the-art parameter selection criteria and illustrate its superiority in terms of accuracy and computational time on simulated and real data sets.